AmritaNLP@PAN-RusProfiling : Author Profiling using Machine Learning Techniques
نویسندگان
چکیده
This paper illustrates work done on "Gender Identi cation in Russian texts (RusPro ling)" shared task, hosted by PAN in conjunction with FIRE 2017. The task is to predict the author’s gender, based on the Twitter data corpus which is in Russian. We will give a brief introduction to the task at hand, elaborate on the data-set provided by the competition organizers, discuss various feature selection methods, provided experimental analysis that we followed for feature representation and show comparative outcomes of di erent classi ers that we used for validation. We submitted a total of 3 models and their respective prediction for each test data-set with slightly di erent pre-processing technique based upon the test corpus content. As each of the test corpus were sourced from various platforms, this made it challenging to stick to one representation alone. As per the global ranking published for the shared task[6] our team secured 2nd position overall (Concatenating all Data-set) and our 3rd submission model performed the best among the 3 submission models from the overall test data corpus. Further under extended work we discuss in brief how hyper parameter tuning of certain attributes extend our validation accuracy by 6% from baseline.
منابع مشابه
The Winning Approach to Cross-Genre Gender Identification in Russian at RUSProfiling 2017
We present the CIC systems submitted to the 2017 PAN shared task on Cross-Genre Gender Identification in Russian texts (RUSProfiling). We submitted five systems. One of them was based on a statistical approach using only lexical features, and other four on machine-learning techniques using some combinations of genderspecific Russian grammatical features, word and character n-grams, and suffix n...
متن کاملSegmenting Target Audiences: Automatic Author Profiling using Tweets: Notebook for PAN at CLEF 2015
This paper describes a methodology proposed for author profiling using natural language processing and machine learning techniques. We used lexical information in the learning process. For those languages without lexicons, we automatically translated them, in order to be able to use this information. Finally, we will discuss how we applied this methodology to the 3rd Author Profiling Task at PA...
متن کاملOverview of the RUSProfiling PAN at FIRE Track on Cross-genre Gender Identification in Russian
Author profiling consists of predicting some author’s traits (e.g. age, gender, personality) from her writing. After addressing at PAN@CLEF mainly age and gender identification, in this RusProfiling PAN@FIRE track we have addressed the problem of predicting author’s gender in Russian from a cross-genre perspective: given a training set on Twitter, the systems have been evaluated on five differe...
متن کاملRepresentation of Target Classes for Text Classification - AMRITA_CEN_NLP@RusProfiling PAN 2017
This working note describes the system we used while participating in RusProfiling PAN 2017 shared task. The objective of the task is to identify the gender trait of the author from the author’s text written in the Russian Language. Taking this as a binary text classification problem, we have experimented to develop a representation scheme for target classes (called class vectors) from the text...
متن کاملAuthor Profiling with Word+Character Neural Attention Network
This paper describes neural network models that we prepared for the author profiling task of PAN@CLEF 2017. In previous PAN series, statistical models using a machine learning method with a variety of features have shown superior performances in author profiling tasks. We decided to tackle the author profiling task using neural networks. Neural networks have recently shown promising results in ...
متن کامل